Re: Hash index todo list item

Поиск

Список

Период

Сортировка

От	Jens-Wolfhard Schicke
Тема	Re: Hash index todo list item
Дата	10 сентября 2007 г. 08:28:06
Msg-id	AF62AE70763EFC85F410F42A@[192.168.1.85] обсуждение исходный текст
Ответ на	Hash index todo list item (Kenneth Marshall <ktm@rice.edu>)
Список	pgsql-hackers

Дерево обсуждения

More random thoughts:

- Hash-Indices are best for unique keys, but every table needs a new hash 
key, which means one more random page access. Is there any way to build 
multi-_table_ indices? A join might then fetch all table rows with a given 
unique key after one page fetch for the combined index.

- Hashes with trivial hash-functions (like identity) can also return rows 
in a desired order.

- Is there a case where a sequentially scanning a hash-index is useful? I 
can't find any, but maybe somebody else has a use-case.

- What about HashJoins when the referenced tables have hash-indices?

- What about hash-indices where entries are inserted for multiple columns. 
Assume a table like this:

CREATE TABLE link (obj_id1 INT4, obj_id2 INT4);

and a query like

SELECT * FROM link WHERE ? IN (obj_id1, obj_id2);

or some join using a similar condition. It might be a nice thing to insert 
entries at both HASH(obj_id1) and HASH(obj_id2), which would eliminate the 
need to check in two indices and do a bitmap OR. OTOH it might not be 
faster in any significant use cases because who'd need a link table with 
nearly unique linked objects?

- In cases where the distribution of the hash-function is good, but a small 
and relatively even number of rows exist for each key (like it might be the 
case in the above example), it might be nice to reserve a given amount of 
same-key row entries in each bucket, and hold a fill-count at the front of 
it. That would avoid costly page fetches after each collision. You'd create 
a hash-index with n-buckets, each m-elements large. When the bucket is 
full, the usual collision handling continues.

- About hash enlargement: What about always using only the first k bits of 
each hash value. When you find that the hash is "quite-full" (however that 
is defined and detected), k is increased by one, effectively doubling the 
hash size. New entries are then written as usual, while retrieving the old 
entries needs to test at the k-bit-position first and if there is a miss, 
also at the k-1-position and so forth. To limit this search, some 
background process could after analyzing the index move old entries to the 
now correct k-bit-position and increment some "min-k"-value once all old 
entries have been moved. After the hash has been increased, the index would 
approximately half it's speed for some time then. Additionally one could 
also insert the entry at the new position if it has been found at the old 
one only while using the index. A special "miss"-entry at the new position 
doesn't help if nothing could be found because the old positions will 
usually hold some data which resides there even if it uses k bits.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Hash index todo list item